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@ A metripd and .systenri for synchronizing threads 



@ A method and system' is described for synchro- 
nizing execution by a processing element of threads 
within V process. Befob execution of a thread com- * 
mences/a determination is rfilide as to whether all of - 
the required resources for execution of the thread 
are availabie in' a cache ' local to the ' processing 
element. If the reisources are not available, then the 
resources are fetched from main storage and stored 
in one or more local caches before execution begins: 
If the resources are^avaiiable^then'e^^^ 
thread may begin. During execution of the thread 
and, in particular, an instruction within the thread, the 
instruction may require data in order to successfully 



within a process. 



complete its execution. When this occurs, a deter- 
mination is made as to whether the necessary data 
is available. If the data is ;ayailable;;the result of , the 

" instruction execution is stored and execution of the 
thread continues. However, if the data is unavailable, 
then the thread is deferred until the: data becomes 
available and a new thread :is processed. When 
deferring a thread, the thread is placed in the mem- 

■^ory location which is to receive the required data. i 
Once-the"data 1s available.^the thread is removed 
from the data location and placed on a queue for 
execution and the data is stored in the location. 
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TECHNICAL FIELD 

This invention relates in general to data syn- 
chronization within a parallel data processing sys- 
tem, and more particularly to the synchronization of 5 
threads within a process being executed by a pro- 
cessing element. 

BACKGROUND ART ' 

~ 10 

In parallel data processing systems, programs 

to be executed-may be divided into a number, of _ _ 

processes which may be executed in parallel by a 
plurality of processing elements. Each process in- 
cludes one or more threads and each thread in- i5 
eludes a group of sequentially executable instruc- 
tions. The simultaneous iexecution of a number of 
threads requires synchronization or time-coordina- ; 
tion of the activities > associated with each thread. 
Without synchronization a processor may sit idle 20 
for a great deal of time waiting for data it requires, 
thereby degrading system performance and utiliza- , 

tlon. ^-p. - n ■ ^"-fx^iy ; /.-^■-v;; - - 

A thread located in one process is capable .pf 
communicating with threads Jn another process , or,, 
in the same process and there^re^^rio^s^level^^ 
of synchronization are required in ofder to have an 
efficiently executing system w a '^High" degree of 
system performance. . ^ 

\n order to synchronize the communication of 30 
threads located in different processes, a synchro- 
nization mechanism, such as; l-structures may be 
used, l-structures are used in main storage and are 
described in l-structures: Data Structures. for Par- 
allel Computing by Arvind, R.S. Nikhil and K.K. 35 
Ptngali. Massachusetts Institute of Technology Lab- , 
oratory for Computer Science, February 1987. e\- ;^ : 

Synchronization of threads communicating be- 
tween different processes does not . negate the ; 
need for a synchronization , mechanism used to . 40 
synchronize threads within the ^same ) process. > , 
Therefore, a need still exists for an efficient manner 
to synchronize threads within a process thereby 
providing greater system utilization and perfor- 
mance. A need also exists for a synchronization 45 
mechanism of threads within a process wherein the 
synchronization mechanism is local to the process- 
ing element in which the threads are executed. A 
further need exists for a synchronization nnecha- 
nism which does not place a constraint on the so 
number of processes and threads which may be 
executed by the processing element due to the 
size of local memory. 

DISCLOSURE OF INVENTION 55 

The shortcomings of the prior art are overcome 
and additional advantages are provided in accor- 



dance with the principles of the present invention 
through the prdvislbn"<>f -a method and system for 
synchronizing threads within a process. 

In accordance with the principles of the present 
invention, a method for synchronizing execution by 
a processing element of threads within a process is 
provided. The process includes fetching during ex- 
ecution of a thread within a process a datum field 
from a local frame cache and an associated state 
indicator from a state bit cache. The state indicator 
has a first state value which is used to determine 
whether the ^atum field includes a datum available 
for use by the thread. If the datum is unavailable, 
then execution of the thread is deferred until the 
datum is available. 

In one embodiment, the thread is represented 
by a continuation descriptor and the step of defer- 
ring the thread includes storing the continuation 
descriptor within the datum field. 

In yet another embodiment, the method of syn- 
chronizing threads includes awakening the deferred 
thread when the datum is available for the thread. 
Awakening mcludes removing the continuation de- 
scriptors stored in the datum field anci then placing 
I the datum in the field, . 
I In another aspect of the invention,^ a system' for 
1 synchronizing execution i)y* 'a^ processing element 
of threads within a process is provided. The sys-. 
tem includes a local frame cache and a state bit 
cache, means for executing by the processing ele- ' 
ment a thread within a process and means for 
fetching from the local frame cache a datum field 
and from the state bit cache an associated state 
indicator. The state indicator has a first state value 
and the system includes means for determihihg 
based on the first state value whether the datum 
field includes a datum .available for use by 'the 
thread. . Should the datum . be unavailable, then 
means for deferring execution of the thread until 
the datum is available is provided. ^ . . . ' 

In, .one embodiment, the system further in- 
cludes means for determining a second state for 
the state indicator wherein the second state will 
replace the first state during execution of the 
thread. The first state may indicate a current state 
and the second state may indicate a next state. 

In another aspect of the invention, a method for 
synchronizing execution by a processing element 
of threads within a process is provided. Each 
thread includes a plurality of instructions and the 
method includes executing an instruction. During 
execution of the instruction, at least one source 
operand is fetched from a local frame cache and at 
least one corresponding state indicator having a 
first state is fetched from a state bit cache. Also, 
fetched from the instruction is at least one state 
function associated with at least one fetched 
source operand. The state function is used to se- 
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lect from cone ofvfa plurality pf ^tables ^1^ possible . 
second states for ttie state indicator wherein each 
of the second states has an associated flag indica- 
tor. The first state is used to choose from the : 
selected N possible states a second state for the 5 
state indicator and the second state replaces the 
first state during thread execution. The flag indica- 
tor specifies one of a plurality pf actions for^the 
thread to perform. . /v. . , 

In accordance with the principles of the preserit ^ io 
invention, a method and system Jfor sy^nchron^^^^ 
threads within a process is provided. The synchro- 
nization mechanism of the present invention sus- 
pends execution of a thread when data for that 
thread is unavailable thereby allowing another is 
thread to be executed. This provides for increased 
system utilization and system performance. . 



BRIEF DESCRIPTION OF DRAWINGS 



20 



The subject matter which is, regarded , as .^the , 
invention is particularly pointed out and distinctly 
claimed in , the claims at ;,the conclusion , of 
specification. Jhe, foregoing and (Mh^T0Obiec^$.^fea^ 
tureSi/andt>advantages^of,4he^jr^ 
parent from: theiMpwingf detailed ^ 
in conjunction with ahe ,accpmpariyi^^^ 

which; 'u^rji^Gr^ ^^^j t^jiHw c^t 

FIG. 1 U'":.i--:.:)C ■ T'. ^^d:^:"v/ odj * 

depicts one example, of a block .diagram^pf.^ 30 
parallel processing system, in accordance with 
the principles of the pi-esent invention; ; ^ , ^ 

FIG. -2 ■ • ■ . V , . 5;,,^;,... 

is one example of the logical cornponents,asso-, 
ciated'with a main memory coritrol unit of the 35 
parallel processing system of FIG. 1, in accor- 
dance with the principles of the present inven- 
tion; , ■ ■., - , ..^ : 

FIG, 3' 1 --^ C . ^r.r ^.u,, u.-^^ ,, , 

is an illustration of one embodiment.of.a^^ 4o 
local frame residing in the main .memory coritrbf 
unit of FIG. 2. in accordance with the principles 
of the present invention; , ; 

FIG. 3a 

depicts one example of a logical work frame 45 
associated with the local frame of FIG. 3. in 
accordance with the principles of the present 
invention; 
FIG. 3b 

illustrates one embodiment of the fields con- so 
tained within a non-compressed continuation de- 
scriptor of the present invention; 
FIG. 4 

is an illustration of one embodiment of the en- 
tries of a logical code frame residing in main 55 
memory control units of FIG. 2, in accordance 
with the principles of the present invention; 
FIG. 4a 



depicts one example^df :,the fields wittiin ar^ 
Instruction located in the code frame of FIG. 4,^5^ * 
in accordarice with the principles of the present 
invention; 
FIG. 4b 

depicts one example of the components of the 
destination and source specifiers of the instruc- 
tion of FIG. 4a, in accordance with the principles 
of the present invention; ' " ^ 

Illustrates one 'enrtbodiment df a' block diagram - ^ ^ 
of the 'harbware^'to ^^bf -ai processing ■ ^ > 

element of FIG. 1 Tin accordance with the princi- 
ples of the present invention; 

depicts one example of the comjDonents of a 
ready queue entry within a ready queue de- 
picted in FIG. 5. in accordance with the princi- 
ples of the present invention; - 

FIG. 7 " ' 

is one exarnple of the ■cbmponerits of a local^^^^ ; 
c6htiriuatiori'"'queue^^ within ^e'^'p^'op© 
ment'of fig: srin ^acbbV"aance with^the"^F)fihciples ■ ' 0 
of the 'presefit m\^Bnti6hV' -j-wS^^ ^-V-; 

illu^trates^e'^am 
compdneifts'l^lu^cfeiate^ 

cache re^g'^itW proc&ssihgf 'element^bfK^^^^^ 
th^lirkenri^iN^ehtiOT^^^^^^ 

FIG''9''"^'"^'*' ^''^^-'^^"^^'''^^ 

depidts one "exajmfiie '^of ^ ^^code ^^rame cache 'J>^ 
directory associated with the code frame cache 
of RG? 8, in ^coftjari^ with the ''principles of 
'the present -invethtionr,^' ■ ^^^i.:^::;/--' f ^: :":;rv\;; 
FIGS. 'lOa, TOb_ " ■: '-•■^-^^ ^ 

depict one exampiie^ of a block diagram of the 
cc^ponents associated with*a local frame cache 
located within the' processing element depicted 
in FIG. 5, in accordarice with the principles of 
the present invention;' - " ' v 5 

depicts one examiple of a local frame cache 
directory associated with the local frame cache 
of FIGS. 10a. 10b, in accordance with the princi- 
ples of the present invention; 
FIGS- 12a, 12b 

depict one example of a flow diagram of the 
synchronization process of the present inven- 
tion; and 
FIG. 13 

depicts one example of a flow diagram of the 
processes associated with writing data to a loca- 
tion in a cache, in accordance with the princi- 
ples of the present invention. 
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BEST MODE FOR CARRYING OUT JHE INVEN- 
TION I ^ . -Hps 0- -V t> ^.:^}^ i ' > ■ ■ 

The synchronization mechanism of the present 
invention resides within a parallel data processing s 
system 10. as depicted in FIG. 1. In one embodi- 
ment, parallel data processing system 10 includes 
one or more main memory control units 12, a 
plurality of processing elements^ <PE) 14 and a 
plurality of input/output processors (I/OP) 16. Pro- ^ io 
cessing elements 14 communicate, with each other, 
main , memory control ^ units 42^:and.j^inJ>ut(output . 
processors 16 through an interconnection, network 
18. One example of the main components asso- 
ciated with main memory control units 12 (or main 15 
storage) and processing elements 14 are explained 
in detail below. 

Each of main memory control units 12 contains 
a portion of a sequentially .addressed linear mem- 
ory address space (not shown). The basic unit of 20 
information, stored in the adcjress space . is a word 
(or memory location) ihaving a ^-^nique address 
across all .main memory control units.. Contiguous 
words or memory locations may^ ^e combine(;l into 
a logical structure such as a local frame'26 '|[FiG. 25 
2); a cpde,frame i22 jpd^a'jf^oi^k^ 
embodiment, i local {frame 20 , and .workjrame 23 
generally refer to a, group of data words and code 
frame 22 refers to a group of.^instructipns.^There., 
may be a plurality of local frames, work frames^and/ 30 
code frames within^main memory, .control. units 12. 
In one embodiment, a particufar local frame is 
associated with a process such that the address of 
a local frame is used as the identifierpf a process. 

Referring to FIG. 3, local frame 20 has, in one 35 
example, 256 Ipcal . frarne locations 24. The first 
four; locations are reserved for .an. invocation con- 
text map entry 26, which is associated with a 
process to be executed by one of processing ele- 
ments 14, the next two slots .are , .reserved for 4o 
functions not discussed herein and the remainder 
of the locations (six through 255) are reserved for 
data local to the process. The information con- 
tained within invocation context map entry 26 is 
established prior to instantiation of the process and 45 
includes, for instance, the following fields: 

(a) A three bit state indicator (ST) 28 which 
indicates the current state of a local frame loca- 
tion. As described further below, state indicator 

28 is relative to a state function, which is used in so 
accessing the frame location; 

(b) A twelve bit physical processing element 
(PE) number 30 which identifies which process- 
ing element is to execute the process; 

(c) A three bit process state (PS) 32 which 55 
indicates the state of the associated process. A 
process may have any one of a number of 
states including, for example: a free state, which 



is used as a reset state to indicate that the 
process Is no Monger - active; an inactive : state; J 
used to prevent'^ process from executing; a 
suspended state, used to prevent any modifica- 
tion to a process so that, for example, the op- 
erating system may perform a recovery: an ac- 
tive state in main storage, used to indicate that 
the process can execute and that it is main 
memory; or an active state not in main storage, 
used to indicate that the process can execute 
and' it Is within the processing element assigned 
to execute the process; • j 

(d) " A two bit 'local frame state (FS) 34 which 
indicates the state of local frame 20. Local frame 
20 may have, as an example, one of the follow- 
ing states: * 
a present state representing that the local frame 
is present in main storage; 

a transient state representing that the local 
frame is transient between main storage and 
memory local to the processing element, which 
is* identified by processing element number 30; 
and-';;' ■ ' • ^ ■ • < - 

an absent state indicating that references to 
local frame 20^ are ^ to be rediriacted to the^ pro- ■ 
cessing element's focal -memory^'' as vindicated - 
by^phi/sical^pVb'ce^^ hurnber 30;*^' ^ 

(e) A^^ohe bit- in^^^^ context queue » control 
(ICQC) 36 which indicates the manner in which 
the process is enqueued onto an invocation con- 
text queue (described below) at instantiation; ^ ^ 

(f) A one bit cache pinning control (CPC) 37 • 
which indicates whether a code frame or a local 
frame located within the processing element 
(e:g..'Withm code frame cache or a local frame 
cache? which is described below) is ^ to - be 
pinned. 

(g) An eight bit local continuation queue head 
(LCQH) pointer 38 which contains an offset into 
a first entry of work frame 23 (FIG. 2) which is in 
contiguous memory locations to local frame 20 
(described below); ^ 

(h) An eight bit local continuation queue tail 
(LCQT) 40 which contains an offset into the first 
empty entry at the end of work frame 23; 

(i) A forty bit invocation context queue (ICQ) 
forward pointer 42 which is used in creating a 
doubly linked list of processes (referred to as an 
invocation context queue) in main storage for 
the processing element identified by processing 
element number 30. The invocation context 
queue has a head and a tail which are retained 
within the processing element, and it is ordered 
based on the enqueue discipline indicated by 
invocation context queue control 36; 

(j) A forty bit invocation context queue (ICQ) 
backward pointer 44 which is also used in creat- 
ing the invocation context queue of processes; 
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and 



(k) A forty bit code frame ^pointer 46 which ^ 
specifies the address of a current code frame 22 

As previously stated, locations six through 255 
of local frame 20 are reserved for data. Each data 
entry 48 includes state indicator 28 and ;a 64-bit 
datum field 50. Datum field 50 contains the data 
used during execution of the process, int : . 

Referring once again to FIG. decoupled to 
each local frame 20 is a logical structure referred.^, 
to as work frame 23. Work frame 23 is allocated m 
the next 256 contiguous locations to local frame 20. 
Local frame 20 and work frame 23 are managed as 
a single entity. The first, for. example, sixty-four 
entries or locations 52 of work frame 23 (FIG- 3a) 
include one or more compressed continuation de- :. 
scriptors (CCD) 54 used, as explained below, in. 
selecting an instruction to be executed or data 
which is to be used by the instruction. Each com- , 
pressed continuation' descriptor''54vrincludes.ccfor , 
instance! a ' bode " bffs6t>56 isand r an t index. ,'>58 
(descried below). Ih 'contrast. a continuation: de- ,: 
scriptbr^HiSK is not' compressed ';also ^includes, a.jfi,. 
'■' local f raiTi'e' ^'pointer i 60^^ (FIG.^ 3b).s which jindicates . 
the beginning ^df -local Hframe .20.«A^.compr^ssed 
continuation descriptor does 'notcheed'.to, store , the,, ;t; 
local frame pointer,^ since' it. mayvbe cinferred.trom^. ^ 
the 'mWri 'storage address' of ^thealoeal. -if rarne/work , 
frame pair. In one embodimenti:each location_;52 in - 
worl< frame 23 is capable of :> storing tour: com- 
pressed contihuatiori descriptors." ^ -j;. ■ ' > 

Referring once again to FIG. 2. the local frc(me/^,,, 
woA''"iram§""pa«r^'is-*'"ctoupled'^ ao i-codesframe..822..-.r 
through code frame pointer 46 of invocation context 
map entry 26 embedded within, local : frame 20,, , 
Code frame 22 includes, for instance. 256 , code 
frame locations 62 (FIG. 4) and each . location in- 
cludes a' word-sized instructipn.v64 or r.an, mime 
constant (riot shown).* Which is> associated iwitfi, the 
pr6cess to be executed by processing element 14 
as indicated by processing element number 30. 
Subsequent to loading the instructions or constants 
(data for constants are stored at code frame gen- 
eration) into the code frame, the code frame is 
immutable and thus, may be shared by other pro- 
cesses and processing elements. In one embodi- 
ment code frame 22 is managed in main storage 
in sequentially addressed and contiguous group- 
ings of sixteen word blocks. This allows for efficient 
transfer of code frames in main storage to memory 
locations local within the processing element (e.g.. 
code frame caches, which are described below). 

Referring to FIG. 4a, instruction 64 is. for in- 
stance 64 bits in length and includes the following 
fields: 

(a) An 8-bit instruction operation code 66 which 

specifies the operation to be executed by the 
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processing; el|j%^t. operation coj ccm-:^^ 
trols the arithmetic/logicat units and instruction > 
sequencing. In addition, it also controls network 
request generation; 

(b) A two-bit thread control (TC) 68 which speci- 
fies the sequencing controls for the current 
thread and its successors (a process includes 
one or more threads of executable instructions) 
within ' the processing element in which . the 
threads are being executed. The sequencing t: 
may be for example, sequential instruction dis- 
patch. preVenti^fe suspensive submiode or end of 
threadi each of which are described herein, fi-. 
Sequential instruction dispatch is the mode of 
execution which entails sequential dispatch of 
the: instrurtion^ qf a thread l^ing executed by 
the processing element. ■ 
Preventive suspensive submode causes suspen- 
sion of the current thread at initial dispatch into .. 
the processing element of an instruction within 
the thread: Should the ' -insti'uction execute sue- ; 
cessyiy.' tf»e%read irfequeued in a last in-first " 
out fashion' onto'^a-iocaV^'c 
the'Wrent^pipce^^ 
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ofWthrea»s^fi<^t^^B^^^ 
a^W^r\r!^ S^ tft^ tKt^ad^alrsf^as^^escribedc< r 
belb>r F^ilowiHg'^xiSiUti^ bf ^the^ instruction;saf^ 
newlhr^ad'ls"^ispa^ 

element. ^ , ■ „ 

End of thread 'indicates to the processing ele- 
ment' tharthe 'curr^nt^ execu^ 
tion of this instrilctibn^VVhen' termih^^^ 
ttire^ is ^ ^^deteat^^' "t!% prbc^ssing'^^lement 
svyitches 'to ' the next '^^"^ 
eciAed.' I'hls:: thread - m^^ same 
process or a higher priority process; which is 
enqueued ; L!F0 (Last in-First out) after initial 
dispatch of tfi^ current process into the process- 
ing element. 

(c) A two-bit index increment contror(X) ' 70 
which controls the increment of the value of 
index 58 {FIGS: 3a. 3b) in^ the current continu- 
ation descriptor. When index increment control 
70 is set to a nonzero value, index 58 is updated 
after execution of the current instruction and 
prior to execution of the succeeding instructions 
in the same thread. 

In one example, index increment control 70 may 
indicate no change in the value of index 58 from 
this instruction to the next or it may indicate the 
incremental value is plus one or minus one; 

(d) A sixteen bit destination specifier 72 is an 
address which indicates, for instance, the target 
of the instruction execution's result; and 

(e) A sixteen bit source operand 0 specifier 74 
and a sixteen bit source operand 1 specifier 76. 
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Source operand specifiers 74. 76 |ire addresses 

which enable source pperands'Jo^ b^^^ 

for the execution functions within the processing' 

elennent. 

Destination specifier 72 and source operand 
specifiers 74 and 76 each contain the following 
fields, as depicted in FIG. 4b: 

(a) A four bit addressing mode field (AM) 78 
used to encode the various sources (including, 
for example, signed literal.^^ indexed signed lit- 
eral, local frame cache (described^^ below) or 
indexed local frame cache), for the^instructio^^^ 
operand and destination specifiers/ Addressing 
mode also encodes whether indexed operand 
addressing is to be used. 

(b) A four bit state function field (SF) 80. In one 
embodiment, instructions ^accessing "locations 
within a local frame cache (described further 
below) include for each source operand specifier 
and the destination specifier, a state function 
used in indicating the ^synchronization function 
being used by that specifier. In >cc6l^danc^^ 

the principles of the present inveptipri, a'rium 
of synchronization functions ,nniay, be supported 
and; ..therefore,. there »s^a^^state^,f^^ 
ciated ;with pach pUbe'^avaJjab^ 
functions:^ Eachj^state function .allQws,^^ 
ple.ctwovinterpretatiqns^^^^ ^JP"" A 
and one for a write ,access.\^Examples'. of the; 
synchroni2ing,.functions which may, be support- 
ed by the present invention include: bne-Time 
Producer, Multiple Consumers (OPMC). which is 
similar to l-structures^.and has a write once' jDro^)- 
erty. Jt refers Jo Jhe, production data value 
which may bemused by a.number ^ 
and Multiple Producer, , Sing la! Consumer. Which 
refers tot.the production of several data values 
used by one thread. In one embodiment, the 
resulting actions may. be dependent on the state 
function applied, the current^ state^'of the local 
frame location, and the access^type/ read or 
write, as described in detail below. 
The state function field is ignored when ad- 
dressing mode 78 selects, for example, a literal 
operand. 

(c) An eight bit frame offset 82 interpreted as an 
offset into one of the accessible local frames 
within a local frame cache (described below) or 
as a literal operand, as selected by the address- 
ing mode of the source/destination specifier. As 
explained more fully below, when frame offset 
82 is used as a frame offset it can be used 
directly or it can be initially added to the value 
of index 58 from the current continuation de- 
scriptor, modulo 256. In one embodiment, it is 
then appended to local frame pointer 60 in- 
dicated by addressing mode 78 and a local 
frame access within the local frame cache is 



attempted under control of state function 80, 
descrit>ed in detail below. 
Each code frame 22 within main memory con- 
trol units 12 may exist in a state of absent or 
5 present. These states exist and are managed by 
software. An absent state indicates that the code 
frame is not in main storage and therefore, a pro- 
cess requiring the absent code frame is prevented 
from being instantiated. A present state indicates 
10 that the code frame is present, in main storage and 
'therefore, an inpage request from a processing 
element may be serviced., Once the code frame is . 
in this state, it remains in this state until the frame 
is no longer required and it is returned to free 
15 Storage under software control. 

Referring once again to FIG. 1, main memory 
control units 12 are coupled to processing ele- 
ments 14 through interconnection network IS (FIG. 
1). In accordance with the principles of the present 
20 invention, one example of the hardware compo- 
nents associated with each processing element 14 
are depicted in FIG. S-and include the .following: a 
ready queue 84, .a local ^continuation queue 86. a 
code%ame cache 88, a local frame.jcache 90..a 
25 state bit'^cache 91 eahd :aoi execution uhiV92, Each 
^^-of these^cbmponents are described in,,de^tail Jierein. ^ 
Ready^queue 84(is, for example, a , fullyj,^^^^^ 
ative'''membry:?^ structured tessentia^ly^^as, aj^queite 
that 'is capable of. being ; enqueued ,f I tfj^ head or 
30 the tail depending on-the enqueue ;discipline pf the 
* ready '^queue > as rispecified.tt)y,6inyocation context 
queue control 36. Ready queue, 84 includes a num- 
ber of ready queue entries 94 , (FIG. 6) correspond: 
ing to processes or. invocations to be , executed by 
35 processing element 1 4. ^Ip:. one . instance, ready , 
^ queue 84 includes sixteen ready^queue entries. As 
depicted in FIG. 6 and described herein, each 
ready queue entry 94 includes, for example, the 
following fields: . , > 

40 (a)''A three bit ready queue (RQ) state 95 used 
'id' indicate j-the current . state. ^pfv,a ;t:eadyj queue, 
'entry. Each ready queue entry may be in one of 
a number of states, including, for instance, the 
following: empty, indicating that the entry is un- 
45 used and available: ready, indicating that the 
process is ready for execution by the process- 
ing element: pending pretest, indicating that the 
process is awaiting pretesting (described further 
below); prefetch active, indicating that the re- 
50 quired resources for execution of the process 
are being fetched (this will also be described in 
detail below); sleeping, indicating that the ready 
queue entry is valid, but no threads within the 
process are available for dispatching to the pre- 
ss cessing element; and running, indicating a 
thread from the process represented by the 
ready queue entry Is being executed by the 
processing element; 
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(b) A local frame pointer 96 is used, for in- 
Stance, in accessing local frame cache 90. 
which as described in more detail below, in- 
cludes the data required for process execution. 

In addition, local frame pointer 96 is used in s 
determining whether a local frame exists within 
the local frame cache. Local frame pointer 96 is 
a copy of local frame pointer 60 and is loaded at . 
the time. that the ready queue entry is filled in; 

(c) A local frame cache physical pointer 97 is an ' w 
address into local .frame cache 90; ^ , ! ' . 

(d) A three bit local frame cache' state 98' is ' '^^^^^ 
used to indicate the current state of a local 
frame in local frame cache 20. A local frame 
within the local frame cache may have a number 75 
of states including, for example: empty, indicat- 
ing that the frame state for that local frame is ' 
unknown or not present; transient, indicating the 
local frame is currently being inpaged from main ' 
memory control units 12 to local frame cache 90 20 
in processing elem^ 

ing jhe tiocal frame ' is located in local frame " 
caqhe^gq;^ , 

(e) A codeJrame pointer 99 is used in accessing 
code irame^.cachem, Code, frame pointer 99 is ^^25 
a copyv^of .code pointer; 46 locatejd in, local -frame v 

20" ■■ ' ■ • ■ ■ , ' ■ 

(f) A code frame cache physical pointer 100 is 
used to^address. a block of/instructions in code 
frame cache 88, as descnbed further below; 30 

(g) A three bit code frame cache state 101 is 
used to determine the current state of a code 
frame within code frame cache 88. A code 
frame rnay hay^e a ^ number ^of^sta^^^ includmQ.^ 

for example: empty, in^^ that the frame " ' 35 

state for a particular code frame is unknown or 
not present; transient, jndicatihg the code frame 
is. currently being inpaged from main ririemory 
control units 12 ^itq, code frame cache ^^^8^ in ' 
processing element V4; and jDreserit. indicating 4o 
the code frame is Jocated . in code frame cache 
88. 

(h) A local continuation queue head pointer 102 
is located in each ready queue entry and is 
used, as described more fully below, in indicat- 45 
ing the head of the list of threads for the particu- 
lar process to be executed within a processing 
element. During processing, as described below, 
local continuation queue head pointer 102 re- 
ceives its information from local continuation 50 
queue head pointer 38 located within invocation 
context map entry 26 of local frame 20 which is 
associated with the process to be executed; and 

(i) A local continuation queue tail pointer 103 is 
located in each ready queue entry and is used, 55 
as described more fully below, in indicating the 

tail of the list of threads for the particular pro- 
cess. Similar to head pointer 102, local continu- 



ation 'queue tail pointer ''1 03 Is received from 
lockl frame 20. In pahiculan^duriing^ enqueue into 
the ready queue, local continuation queue tail 
pointer 40 in local frame 20 is copied into ready 
queue entry 94. 
Associated with each ready queue entry 94 is a 
local continuation queue 86 (FIG. 5). Each local 
continuation queue is. for example, a first in-first 
out queue wherein the top entry in the queue is the 
oldest. In generkl, local continuation queue 86 con- 
tains all of the pending threads or continuations 
associated with1a *process'vvfilcK is bn^the ready - 
queue. The local continuation queue head and tail 
pointers located in ready queue entry 94 indicate 
the valid entries in the local continuation queue tor 
the particular ready queue eritry. Depicted in FIG. 7 
is one example of local cbntiriuation queue 86. 

Local continuation queue - 86 includes a number 
of local contiriuation queue entries 104, in which 
each entry represents a pending thread for a par- 
ticular process. Each local continuation queue entry 
104 contains a connpressed contiriuation descriptor 
including a code bffset^'^IOS'"'^^ 106, 
which ai'e receivecl JFronrr work ffarne 23 O eR^ocle: 
offset 56. index^ 58) of nriain;:memoryvcpntrol units 
: 12.^Co3e offsePlOS^ 
tion Within a cbSe Vame '^located ' in ^'^^^ 
cache 88 ^'ancl index "1 06 is ■ uied ''during indexed 
addressihg to'alter the vaiue 'of tfie S^^ 
"^ locate data within fcached local framb' cache 90 ^ ^^-^ ' 
Local continuation quiiue 86 is coupled to code 
frame cache 88 via code frame ; cache physical 
pointer 100. as described in detaif herein' Referring 
to FIG:- 8.' code^ frame ' cacheJ^ 8^ inclucJes^ la one 
example. 128 code frames 1 08' and 
frame includes, e.g., 256 instructions. In one em- 
bodinrieht. the code frames ' located % code frame 
cache 88 are inpaged ifrom main memory control 
units' 12 to code frame cache 88 during a pifefetch 
^ stage; described lielow. 'Code' frame* bache 88 sup- ' 
ports two simultaneous access ports:^a'¥ead port 
used in fetching instructions and a write port used 
in writing code frames from main storage to the 
code frame cache:"* ' ' ^ ' 

In order to locate code frame 108, code frame 
pointer 99 located in ready queue entry 94 of 
ready queue 84 is input into a code frame cache 
directory 110 in order to obtain a value for code 
frame cache physical pointer 100. In one embodi- 
ment, code frame cache directory 1 10 is organized 
to allow an 8-way set-associative search. 

Referring to FIG. 9. code frame cache directory 
110 includes, for example, sixteen rows and eight 
columns and each column and row intersection 
includes an entry 114. Each entry 114 includes a 
code frame address tag 116 and a state field 118. 
Code frame address tag 116 is. for example, the 
upper thirty-six bits of the 40-bit code frame point- 
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er 99^ and is used in determining the address value 
of code frame cache physical pointer ^1 00.^ State 
field 118 is a three-bit field used in indicating the 
state of a particular code frame 108 within code 
frame cache 88. A code frame within the code 5 
frame cache may have one of the following states: 

(a) An empty state which is defined by an un- 
successful attempt within a processing element 
to locate a particular code frame within the code 
frame cache. Jhis state is proper when the code io 
frame exists only in main storage or within an- 
other, processing element, the . empty state is 
recorded ip the code frame cache at system 
initialization and whenever a code frame invali- 
dation occurs. '5 

(b) A transient state which applies to a code 
frame when it is in a state of motion. For exam- 
pte» the code frame is being moved from main 
storage to the code frame cache within the 
processing element (an inpage operation). Dur- 20 
ing inpaging, , one of , twp possible transient 
states ^may be recorded for^ the. frame, depend- 

ing on the desired final,state ,of, the .code frame . 
at inpage completion. Jhe, state is recorded, as ^ 
transient^final state^^where final state "^.^yTp^^^® v*^ 
Present;;jsta^ 

(describeid^r ^ below);^^^ ^Jor^ a ^ . 

pretesl/prefetch ^Inpage V'^i ?"^^^^r*vf - . 
mapheotrylcache pi^^ning.^coritrol .37^as acti.ve.^f^_^,.^ 
The transient state of a code.frame In ;the pode . V^o 
frame cache^. prevents ,selectiqri.jjof^,,the code 
frame by a cache replacement, algorithm, such 
as for example, a^least recently .used ^^(LRU) 
algorithm, thereby .ailowing-^eyentuaL completion 
of , the inpage operation^ . , ^ ^ . . : 35 

(c) A present state which indicates that the con- 
tents of the ^desired code frame are entirely 
within code frame cache 88. When the code , 
frame is in this state, then processing, element , 

1 4 ,may /ietch the instructions Jocated^^jn;, code \ [ 40 
frame cache 88... . , , , , ./ 

(d) A pinned state which also indicates that the 
contents of the desired code frame are entirely 
within the code frame cache. However, if a code 
frame is marked as pinned, then replacement of 45 
the frame during pretest/prefetch is prevented 
(described below). In order to remove a pinned 
code frame from the cache, explicit software 
action is taken. 

Address tag 116 is used in conjunction with so 
code frame pointer 99 to determine an address 
value for code frame cache physical pointer 100. In 
particular, the four rightmost bits of code frame 
pointer 99_ (FIG. 8) are used to index into one of the 
rows within code frame cache directory 110. Sub- 55 
sequent to obtaining a particular row, the contents 
of each code frame cache address tag 116 within 
the selected row is compared against the value of 



bits 12-47 of code pointer 46. If a match is found, 
then the address value of the code frame cache ; 
physical pointer is obtained. In particular, the ad- 
dress of pointer 100 is equal to the row identifier 
(i.e.. the four rightmost bits of code frame pointer 
99) and column identifier, which is the binary repre- 
sentation of the column (i.e., columns 0-7) in which 
the match was found. 

Subsequent to determining code frame cache 
physical pointer 100. the physical pointer is used in 
conjunction with code offset 105 located in local 
continuation queue 86 to locate an instruction 120 
within code frame 108. In order to select a particu- 
lar instruction 120 within code frame 108, code 
frame cache physical pointer 100 is appended at 
122 on the left of code offset 105 located in local 
continuation queue entry 1 04. ' ' 

In one embodiment, instruction 120 includes 
the following fields which are loaded from the copy 
of code frame 22 located within main storage (the 
following fields are similar to the instruction fields 
described with reference to FIG. 4a, and therefore.^ 
some of the fields are not described in detail, at this 
point): an operation code (OP CODE) 124, a thread 
control (TC) i 26, ah" index jncrement control (X) 
1 27. a destination specif ler ^1 26. a source operand 
zero specifier 128 and a source operand one speci- 
fier 1 30, destination ^ specifier 1 26 indicates the^^^ 
address in which 1he result of the instruction execu-* 
tion is to be written and the source operand specifi- 
ers indicate the addresses of the data operands 
located in local frame cache 90 to be read and 
used during execution^of the instruction. 

Code frame cache 88 is coupled to local frame 
cache/90,''as descnbed in detail herein. Referring' 
to figs! 10a, 10b, local frame cache 90 includes,' 
for example, 256 local frames (1 31) and each^ 
frame mcludes''256 data words (132) (e.g., irivoca-/ 
tion context queue information, destination location, 
source operands). In one embodiment, local frame 
cache 90 is organized into eight parallel vybrd-wide 
banks. Each local frame '131 spans across all eight 
banks such that each bank stores thirty-two words 
of local frame 131. In one example, the first bank 
(bank 0) holds the following words of local frame 

131: word 0. 8, 16, 32. 40 248 (i.e.. every 

eighth word of the local frame); the second bank 
(bank 1) holds words: 1. 9. 17. 33, 41. .... 249 etc. It 
will be apparent to one of ordinary skill in the art 
that this is only one way in which the local frame 
cache may be organized and the invention is not 
limited to such a way. Local frame cache 90 sup- 
ports two simultaneous access ports, a read port 
and a read/write port (not shown). The read port is 
used for fetching operands and the read/write port 
is used for storing results from instruction execu- 
tion and for deferring continuations, as described 
below. 
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In one,embodiment..-the>local, frames located In . 
local frame cache 90. are inpaged from main mem- 
ory control units 12 (I.e.. datum 50 is inpaged) to 
local frame cache 90 during a prefetch stage, de- 
scribed below. In order to locate a local frame 5 
within the local frame cache (so that inpaged in- 
formation may be written to a location within the 
local frame or so that , information may be read 
from a particular, location). jpaaJjframe.po^^ 96.. 
located in ready queue entry 94^is input into^a local . io 
frame cache directory 133 in order to obtain an . 
address value for local frame cache physical point- _ 
er 97 located in the ready queue entry. (In another 
embodiment, it is also possible to obtain the local 
frame cache . physical pointer ^during pretesting is 
(described below), .thereby, eliminating the process 
for obtaining the pointer address fronrj the cache 
directory.) In one embodiment, local frame cache 
directory 133 is organized in a similar manner to ^ 
code frame cache directory 110, i.e., it is organized 20 
to allow.an 8-way set-as.?pciative-sea^^^^ rir ti-^fimt Midnu 

Referring f^ia 4 1 J<^fram^,ca^ M^o 'a^ 
tory 133 includes, f 05. example and^,,.^, 
eight columns and each colunrjn^and 
tion includes :an,entry.^1 34. Each entry 134^ 25 
a local frame;,address tag 436,and^a, statedjeld J38. -....fi 
Locayram^e^ddress^^tag ,;I36 Js, ^for^exam^ 
upper thirtyrfive bits.of thei40rbitsl6cal frame point- l^^' 
er 96 and is usedjnjdetejmin 

of local frame caphe physical^ fie|d^,,^^30 
138 is a three-bit field.used jn indicati^\^^ 
of a^particularvjpcal.framej3;^ withinj.locai fra^ 
cache 90. ^A Jocar-frame within local frame cache ^ 
may have one q^^h§4ollOJ!i^ viOtfrnrj/s^ ^x^p 

(a) .T An rempty,^ statej whichjis^.d,^^ ^by an un-.^.. .35 
successful Tattempt,^within a^processin^. element, 
to^locate^a .particular^rjocalj 

frame cache .90. This state^ is valid .for ^a , local 
frame on the main jtprageJreeJrame for ^ „^ 

one j which, resides ej^^ ^^^^^ 
iSs allocated to a process,| Jhe em 
also be detected when a castout from the local 
frame cache to main storage is in progress for 
the referenced local frame, resulting in the ac- , 
tual inpage being delayed until castout comple- 45 
tion. The empty state is recorded throughout 
local frame cache at system initialization and 
within local frame 131 in the cache whenever an 
attempted local frame inpage replacing a frame 
in cache is aborted. : , , . . so 

(b) A transient state which applies to a local 
frame when it is in a state of motion, e.g. mov- 
ing from main storage to the local frame cache 
(a local frame inpage). During the local frame 
inpage. the state of the inpaging frame is re- 55 
corded within local frame cache state 98. During 
inpage, one of the transient states is recorded 

for the frame depending upon the desired final 



state of local frame 131 at inpage' completion. 
The Jirial state may be present for fa \ ' 
pretest/prefetch inpage (explained further below) 
or pinned for a preitest/prefetch inpage with in- 
vocation context map entry pinning control 37 
active. The transient state in the local frame 
cache prevents selection of local frame 131 by a 
local frame cache replacement algorithm (LRU), 
thereby allowing eventual completion of the in- ' - 
page operation. This allows completion of any 7 ; 
castout assbciafiaid wi *']!P^?®'f?^^^^^^^^ ^ ^^ 

(c) ^A^^^r^e state ^^hicH^^ indicated - 
frame in the local frame cache which is currently 
not allocated to any process. As one example, a 
local frame enters this state through process 
termination. 

(d) A /present state which indicates that the 
coritents of local frame 131 are entirely within ^ 
the local frame cache. When the local frame is ^ ^ 
within this state, the contents are available for 
access by an instruction within the processing 
element. ' , ' • r - 

(e) A pinned state which also indicates that the ■ ; 
contents 'of ^the ' desired local frame, are entiriely 'V^;^ 




the frame ; by ? pretest/prefetch : is J prevented '^j^^^r 
(descrited ;l)elow); In ord^^^^ remove a pinned " ■ 
locarframejfrorri^^^^^ ciicfie.'^softwWr^^ 

Address , tag 136 is used in conjunction with 
local frame pointer 96 to determine the address 
valuOvOf local frame cache physical pointer 97. In 
^ particular. Ithe^^fiye jig bite^of local ; fra , 

pointer^96 are u^ the Irdws 

within Jpcarframe'c^^ directory 1 33. 'Subsequent 
to ot^tainirig a particyiar row^^ each ' 

local frame address tag 136 within the selected row ^ ' 
is cofTipared , against the value of bits^^^^^^^^ r ■ 

loqical local 'frame! address (base^ address (irHhe ^ 
local, frame in mam storage).^ ^M^r >s found, - 
then' the address vaiue of local frame cache phys- 
ical pointer 97 is obtained. In particular, the ad- 
dress of the pointer is "equal to the row identifier 
(i.e.. the five rightmost bits of local frame pointer 
96) and column identifier, which is the binary repre- 
sentation of the column (i.e.. columns 0-7) in which 
the match was found. 

Subsequent to determining local frame cache 
physical pointer , 97. the physical pointer is used 
along with source operand 0 specifier 128 and 
index 106 (i.e.. the index is used if the addressing 
mode indicates that index addressing is to be 
used) to select a datum 132 from local frame 
cache 90 representative of a source operand 0 to 
be used during execution of an instruction. That is. 
a frame offset 140 of source operand 0 specifier 
128 (as previously described, each specifier in- 



9 



17 



EP 0 565 849 A2 



18 



l^,i49;to!mdex.M^ IrV^p^fticUIarf located ^^In ihstructiONn f^2();^^ is a '^^^ ' 

I, physical pointer, 97 is.appended.on the left : state lunction 1 54^ for destination speeifier-^^1 26; ^a ^^ -^ 



eludes , an .addressing mode (AM), state function 
(SF) and'frame offset) is aidded af 1 42 Jp index, 106 ; 
and then, local frame cache physical pointer 97 is 
appended on the left of the summation to indicate 
a particular datum (e.g.. source operand 0) within 5 
the local frame cache. 

Similarly, local frame cache physical pointer 97 
is used with source operand ^1^ specifier 1 30 and 
index 106 to ^select a^datum |32 from local frame 
cache 90 representative of ^ai source operand 1 also 10 
to be used ^durjng instruction execution^^^^ 
lar. a .fr£UT)6 offset 144 of source '.opera^ 1 speci- 
fier 130 7s added at ,146 to ind^^^ ,^06 and then, 
local frame cache physical pointer 97 is appended 
on the left of the summation to indicate a particular 75 
datum (e.g., source operand 1) within the local 
frame cache- A - 

!n addition to the above, local frame cache 
physical . pointer 97 is also used with ^destination 
specifier 126 and index 106 (again, if the index is 20 
to be^jjsed) to select a datum 132 from local frame 
cache 90 representative' of the" Ibcatioh within the 
local irame, cache in, which, e.g.. the result of the 
instruction execution is to be stored. In particiilar. a 
frame^,offset , 1 47 , of^^destination specifier 126 is ' 25 
added,atJ49,tojndi9x.106^ and then, ocal frame . 
cache ~ * -l^iilf Jl'^i^"^!^!!! Q^Dii- "^t 

of the. summation, to iridicate' a partic^^ 
(e.g., a result Jocation). within the local frame cache. 

Associated with each datum stored in local 30 
frame , cache 90 is a 3-bit state indicator 148 Jo- 
cated in state bit cache 91. Similar to local frame 
cache 90. state bit cache 91 ^pcludes^'for ex 
256 locations (152) and each location includes 256 
3-bit State indicators 148. In one iBmbodiment./state 35 
bit cache ,91 is .organized into, eight word-wide * 
banks , accessible ' in parallel. Each .location 152 ' 
spans across a|l eight t)anks such "that .each bank ^ 
stores thirty-two words of location 152. (The * or- 

ganization of, the state bit cache is similar to the " 40 

^ .i H^r.^a,;, .'c'h'^ uiV; ^-.V' (*■ I ^r-'- 
organizatipn^jpf local frame cache 90, as describe 

in detail above.) In accordance Vith the present 
invention, state indicators 148 are inpaged from 
main storage to, state bit , cache ^91^ (i.a, state field " 
28 of data entry 48 is copied) m parallel with the 45 
copying of datum 50 to local frame cache 90. 

The state bits are loaded into or read from the 
state bit cache in a manner similar to that de- 
scribed above for the local frame cache. In particu- 
lar, as shown In FIGS. 10a, 10b, each of the so 
addresses obtained (e.g., by appending the local 
frame cache physical pointer on the left of the 
summation of the particular frame offset and the 
Index, if needed) and used to select a datum 132 
(either a source operand or a destination location) 55 
from local frame cache 90 is also used to select an 
associated state bit indicator 148 from state bit 
cache 91. Each state bit indicator 148 represents 



the current state of a particular datum. A particular 
datum and "its associated state bit indicator are / 
selected in parallel from local frame tache 90 and 
state bit cache 91, respectively, using the process 
described atx>ve. However, it will be apparent to 
one of ordinary skill in the art that the state bit 
cache may be orgainized ip a number of ways and 
that only one embodiment is described herein. It 
will also be apparent to one of ordinary skill in the 
art that it is also ' possible 'to eliminate the state bit 
cache and "place the state bit indicators within the 
local ifranhe cache, e.g.. adjacent to its associated 
datum?-^^^ ' • ^ , u--- 

State bit indicators may have a number of ' 
states (as one example, empty, waiting or present) 
and when an operand and an associated state 
indicator are selected from local frame cache 90 ' 
and state bit cache 91. respectively, the next state 
for each selected operand is determined. In addi- 
tion, when a result is to be written to a location, the 
next state for that location is determined. In one 
embodiment, in order to determine the next state of ^ 
an operand or a result location,' a plurality >bf state 
transition ' tables and a state function ^associated * 



with eafch' specifier is used:^' 



|5eeifier^^1 26i 

state ^function '156 'for source operand' 0' specifier "^^^ 
128 aKS a *stafe^^function'158'f source operand i--^ 
speci'fier^Sb.'Each bf'the^siate functions-is libed^td Z^- 
indicate ttie'syhchroniiatio '(described • 

abovepassociatecJ' with its specific specifier and ' 
each' state function is used as an address into a ^ ' 
state transition table? In one embodiment, 'there is a ' 
state transition table' for ^each specifier, ^^That is. V^^f^^ 
' there is a'statytran^ition^able 160 associated^with 
destination specifier 126. a state 'transition table * 
1 62 associated Vith' source operand 0 'specifier 1 28 - 
and a' state 'transition table 164 associated with 
source operand ''^l specifier 130.' Located within 
' eacH" of fiie istate" transition' tables is ah entry^l 65 
which inclddes 'the possible next states 166 for"-' 
each of the possible state functions. For example, if 
state function 154 represents a synchronizing func- 
tion of One-Time Producer, Multiple Cdnsunfter. 
then located within state transition table 160 (state 
function 154 indexes into state transition table 160) 
is entry 165 including the possible next states for 
that synchronizing function. In one example, each 
entry may include eight possible next states. Fur- 
ther, if state function 154 could represent another 
synchronizing function, such as Multiple Producer, 
Single Consumer, then there would be another 
entry within state transition table 160 containing the 
possible next states for that synchronizing function. 
Similarly, state transition tables 162 and 164 in- 
clude entries 165 which contain the possible next 
states 166. Each state transition table is, e.g, lo- 
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cated, within , processing^ elem^^ and may ^b© 
statically altered at V system initialization in any 
known manner to include additional entries of next 
states which support further synchronizing func- 
tions. 5 

As shown in FIG. 10b, associated with each 
next state 166 is a 3-bit control flag 168. Control 
flag 168 is set at system initialization and is fixed 
for itsr^as$ociated^ ne)rt^ flag 168 is 

used in indicating to the processing element which io 
action js to be takisn for the Jhread which includes ' 
instruction 1 20."'that is.' cpntroi flag^l 66 indicates. 
for instance/ whether execution of the thread is to 
be continued or whether execution is to be de- 
ferred (explained below). 

Referring to FIG. 12. in operation, a process to 
be executed is dispatched by interconnection net- 
work 18 to one of processing elements 14. STEP 
180 "Dispatch Process."\Subsei:|uent to receiving 
the dispatched process, a decision is made within 20 
the processing element as to whether the process 
is to be j^pjaced pn th^^^ invocation Context queue ' 
located within the' main ;membry cortrol unit which 
is associated with thepailicular brbcessing elenient ' 
or on ready queue 84 located yvithin;the processing 1 25 

Incoming process. ^ .-^ , , - " 

In particular, in deciding where to place the ■ ■ 
process, an , initial inquiry is made as to whether the ' 
process IS to be enqueued on ready queue W in a " : 
first ""irvfii^^outj^ 
Enqueued FIFO?" Should the 'prpcess'' be ^ " 
queued in a first in-first out'mann^rV th^ a 'check " ' ^^ 
is rnade^to see if' the /ready ""queue is full and 
therefore,, caQnot accept any more processes, IN- 35 
QUIRY''i86 "Rekly ;Queae! Fu^ If' the ready/ ' 
queuW is^fulirtfib pr^ onto the tair^'* * 

end lot the invocation context queue in main stor- ' 
age until a position Is available in the ready queue, ' ; 
STEP 188 "Enqueue onto ICQ." When placing ^^'a^^^ 40 
process op the tail pf the invocation context queue, 
invocation context queue backward pointer 44 lo- 
cated within invocation context map entry 26 of the 
process "being added is Replaced with the current 
value of the invocation context queue tail. In addi- 45 
tion. invocation context queue forward pointer 42 of 
the last process identified by the old tail Is updated 
to indicate the new tail of the invocation context 
queue, which is equal to the local frame pointer of 
the process being added. Further, the invocation so 
context queue tail is set equal to the local frame 
pointer of the process being added. STEP 189 
"Update ICQT and ICQH." 

Returning to INQUIRY 186, if, however, the 
ready queue is not full, then the process is added 55 
to the tail end of ready queue 84, STEP 190 
"Enqueue onto the Ready Queue." In addition to 
loading the process onto the ready queue, one or 
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more threads associated with the process.rare^en- 
queueS 'onto local cbntihuatio>i queue S6, STEP ^ V 
191 "Place Thread on LCQ." Subsequently, in or- 
der to indicate that there are valid entries in the 
local continuation queue for the ' process on the 
ready queue, local continuation queue head 38 and 
tail 40 are copied from invocation context queue 26 
to local continuation queue head 102 and tail 103 
located in ready queue entry 94 designiated for that 

*' process. -'^ ■ ' ' ' '-z--'-^ ^'"}X" - 

When a process is placed on the ready queue, ^ ^ 
ready ' quetie state 95 located within -ready -queue cvj/. j 
entry 94 Is updated from empty to pending pretest, ' 
STEP 192 "RQ State is Updated." 

Referring back to INQUIRY 184. should a pro- 
cess be enqueued onto ready queue 84 in a last 
in-first out fashion, then the process is enqueued 
onto the head of the ready queue with the possibil- 
ity of Replacing a valid ready queue entry 94 at the 
tail of the ready queue, STEP 190 "Enqueue onto 
Ready Queue." Once again when the process is 
added to the' ready queue.^threads forthairprocess - ^ ^ 
are placed oh local continuation quieiuei Se^^STEP -i^ 
.191 "Place ^read oh LCQ" and read^ quei^^^^ - v 

^"95 i^pdSfeb-^d peifidin^^^ 
StateJis Up<Jate^i3 
castout'to^th 

in rhairf^slorag^^^ to ^th^'^head: of|the ^^r^^e^ 

invocation''c6ntext-''q^ m^l 

'^%rward pdinter 42 for the' new process is updated 
to points to %e^ did head of the invocation context ^ 
queued In addition, invocation context queue back- ^ 
ward pointer 44 of the bid head is updated to point 
to the; 'pirocessr being Padded (usin 

^^pointer);'Further',- the invocation context ^jueuef head v'"^ 
is updated to point to the new process represented - 
by the local frame pointer. Also, local continuatipn>> 
queue Head 102 and tail 103 are copied from ready 
queue entry 94 to local continuation queue head 38 r 
and tail 40 in'invocation context map entry:^26.^^ ^^ 

"V^'^pfeviously mentioned; -whibh a prtKiess is''>^^^? 
added to the ready queue, the state of- the'ready 
queue entry is updated from empty to pending 
pretest. During pending pretest, the availability of 
the resources required for execution of the process 
is determined. INQUIRY 194 "Are Resources Avail- 
able?" In particular, code frame cache 88 is 
checked to see whether code frame 108 as in- 
dicated by code frame pointer 99 in ready queue 
entry 94 is located within the code frame cache. 
Similarly, local frame cache 90 is checked to deter- 
mine if local frame 1 31 as indicated by local frame 
pointer 96 in ready queue entry 94 is located within 
the local frame cache. Should it be determined that 
code frame 108 or local frame 131 is not present 
within its respective cache and, therefore, is not 
available to the process during processing, the 
missing code frame and/or local frame is inpaged 
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from main storage .and thereby made;.available, 
STEP : 196 **Prefetch Resources." In ^ particular, 
code frame 108 is copied from code frame 22 in 
main storage to code frame cache 88 (inpaging). 
Further, local frame 131 is copied from datum 50 5 
located within local frame 20 in main memory 
control units 12 to local frame cache 90 and in 
parallel, state indicator 28 which is associated with 
the datum is inpaged from main memory control 
units 12 (i.e., local frame 20) to state bit cache 91- 10 
The moving of data between main memory control 
units and one or more.=caches allows Jor.the num- . 
ber of processes and threads which can be ex- 
ecuted by the processing element to be bound 
only by the size of main storage and not by a finite /5 
amount of local storage. During inpaging. ready 
queue state 95 is updated from pending pretest to 
prefetch active, STEP 198 "Update RQ State.": 

Subsequent to inpaging the resources during 
prefetch or if an affirmative response is obtained 20 
from INQUIRY 194, ready queue state 95 is up- 
dated from prefetch active to ready indicjating that 
the process, is ready for execution by . the process- 
ing element. STEP^i2po "Update. RQ.^State.":< A , . 
ready; process may^^jbe^exec^^ rr26 
element ;vyhen^^heiprpcesjys^^oi^ 
" entry Jn^ ready queue. ):84.^{Vyhen is : y.> 

selected for-execution , ;the - top^^thread^ Jqcated in , , 
local continuation queue .86 >is j:selected.MiSTEP^ 
"Select Process , and ,fThread.";;When ^ this -pccurs. 30 
ready queue state 95; is updated fronni. ready to 
running, STEP 204 "Update RQ State." In addition, 
the state of the previous running ready queue entry 
is changed from running to empty, ready , or steep-^. 
ing (all of which are described , above) depending 35 
on the conditions for which it relinquishes control of, 
processing within the processing , element. / ; , 

The -selected thread (or local continuation 
queue entry :1 04) from local continuation queue 86 
includes code offset -1 05 which , is used, as de- ^.^ 4o 
scribed above, in selecting an .instruction ,120 from 
code frame cache 88 to be executed. STEP 206 
"Fetch Instruction." When ■ instruction 120 is 
fetched, local continuation queue head pointer 102 
located in ready queue entry 94 is adjusted to 45 
indicate the removal of the processing thread from 
the local continuation queue, STEP 208 "Adjust 
LCQ Head/' 

As described above, the instruction which is 
selected includes source operand 0 specifier 128 so 
which is used to select datum 132 representative of 
a source operand 0 from local frame cache 90 and 
its associated state bit 148 from state bit cache 91. 
Also, source operand 1 specifier 130 is used to 
select datum 132 representative of a source 55 
operand 1 from local frame cache 90 and its asso- 
ciated state bit 148 located in state bit cache 91. 
STEP 210 "Select Data and State Bits." 



In addition to Jthe above, state functions, 156 
and 158 located .In source ope 6 specifier '128'"^ 
and source operand 1 specifier* 130, respectiveiy 
are used in selecting a number of possible next 
states 166 from state transition tables 162. 164. In 
particular, state function 1 56 is used as an address 
into state transition table 162 to select entry 165 
which includes the next states for source operand 0 
specifier. Similarly, state function 1 58 is used as an. 
address into state transition table 164, to select 
entry 165 which includes the next states for source 
operand ,1 .specif ierrSTEP 212 "Select' Possible 
Next States." .(As-described above, eadh state* 
function is representative of a synchronizing func- 
tion and the states associated with each synchro- 
nizing function are included in the state transition 
tables.)' " ' ' 

Subsequent to selecting the possible next 
states for a source operand, the current state (state 
indicator 148) of the operand is used in choosing 
one state froni the possible next states which re- 
presents ,the^^next stat^ ppprand. For exam-' 
pie. ,if there",are ejght the ' 
value of state, bit indicator 148 is zero, then the 

^^next.vState^for^ State Jndicator 148 Js the state 16- " 
cated ^atr^position-.O ,of the eight neirt . states: (i:e.. : ; 
column 0 or Ihe first next state out .of Ihe^ eight ^ ^ 
states)^, 214^';Determine 
embodiment, -Jd^^^^^ particular synchronizing ?uhc- 

,Jion.^ it, nr»ay, be, that state bit indicatoi- 148 repre- 
sents , a present^ .state , for an operand 
been read and the possible next states . for a jaar- 
ticular ^synchronizing function are ernpty, waiting 
and preserit Ip one exarriple, the next state to be 

-.selected for that operand is the present state. Kfter ' ;' 
the next state is determined, state indicator 148 is 
updated to the value of the next state, .^ag.. by ' 
writing the value of the next state into the current ' ' 
state value located in state bit cache 91, STEP 216 

. "Update Current State."': , ' ^ ^5 C^! ; ^ ! 

InJ addition^ to the above. \,a determinatlb^^ is ' 
made as to the course of action to be tak^n by the 
thread which includes the instruction being ex- 
ecuted, STEP^ 218 "Determine Action to be Taken." 
Types of actions which may be taken include, for 
instance, continue with thread execution, suspend 
current thread and awaken deferred thread, each of 
which are explained below. 

In one embodiment, in order to determine what 
action is to be taken by the thread, an inquiry is 
made into whether the data (e.g., source operand 0 
and/or source operand 1) located in local frame 
cache 90 and selected by executing instruction 120 
is available for use by the instruction. INQUIRY 220 
"Is Data Available?" In particular, this determination 
is made by checking state indicator 148 (before it 
is updated to the next state) associated with each 
of source operands 0 and 1 . Should state indicator 
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148 indicate.vfpr jnstfitfice^ an ^operand is jn ^an 
empty state, then that operand is considered un- 
available. If, however, state indicator 148 indicates 
that each operand is in. for example, a present 
state, then the operands are considered available. 
If the data is available, then execution of the thread 
continues and the result of the executing instruction 
is stored in a result location within local frame 
cache 90, STEP 222 "Continue Execution." as de- 
scrit>ed in detail herein. ; ^ . 

In one example, instructions a^^ executed wim 
in execution unit 92. which is coupled jo local 
frame cache 90 within processing elemental 4. Ad- 
dressing mode 78 of each source operand specifier 
located in instruction 120 gates the appropriate 
data, vSuch , as source . operand 0 and source 
operand 1 (which has been obtained as described 
above), into input registers (not shown) of execu- 
tion unit 92. Execution unit 92 executes the instruc- 
tion using,. the obtained operands and places the 
result in fai:^estination^.(pr,result).^ 
within local frame;,cache SK3^iir»^^^^ b^jdes^na; 
tion specifier 126 of instruction i 2(^^lf3^b 
the result of the instruction execution is a branch to 
h a specific lQC#gQ,^jh^ 
a nevyi^ompressed^ci^^ 
enqueued onto local xominuatk^ 
thread, is foi; Jh©.^^"?.-Br<^^ 
being executed) or a new' thread ;ma^^ 
by interconnection, network ;I 8 and ,<Bn^^ 
a different processV local continuation queue.. . 

On the other hand, if the answer to INQUJRY 
220 is. in flie negative andpOne^or^nnore^ of ^ t^^ 
source operands are tiot ayeyjabl^ ^e.g.. Jhe^^ state 
indicator , associated with that operand jndip^tes^t^^ 
operand is :not in a presentstate), then^ex^^^^ 
the thread associated with the executing instruction 
is cdeferred^,. STEP .224 Defer ^ Execution of 
Thread," -(In.pne exarnple,,the particu^ 
continues^executing. butthe resuljtej'a^ 
In particular, if source operand 0 or source operand 
1 is in. for example, a state of empty or waiting and 
therefore, unavailable (if both operands are unavail- 
able, then in one embodiment, operand zero is 
preferred over operand one), then the thread cur- 
rently executing (represented by code offset 105 
and index 106 in local continuation queue entry 
104 within local continuation queue 86) is sus- 
pended until source operand 0 and source operand 
1 (if both are needed) are available. When suspen- 
sion occurs, any affects of the instruction are nulli- 
fied- 

In order to suspend execution of a thread, code 
offset 105 and index 106 (also referred to as the 
compressed continuation descriptor) located within 
the local continuation queue are stored in the da- 
tum location (or field) representative of the unavail- 
able source operand. Each datum 132 may receive 



a nurnber of compressed continuation descriptors : 
corresponding fe'-a'^nulT^ of threads.^ In one ex- 
ample, each datum may store four compressed 
continuation descriptors. 

When data is'to be written to a datum location 
132 within local frame cache 90. the result location 
and its associated state indicator are specified, as 
described in detail above, by frame offset 147 of 
destination specifier 126 located within code frame: 
JO cache 88.''iocal"frame cache physical pointer 97 
and any corresponding index 106. STEP 226 r Data ^ 
is to be" Written" (FIG . ^1 3) In addition to selecting ilx 
the location and the state indicator, state function 
154 located in destination specifier 126 is used as 
T5 an address into state transition table 160 to select 
a number of possible next states 166 for the result 
location (similar to the procedure described above 
for selecting the next state for the source 
operands). As described above, subsequent to se- 
20 lecting the possible next states for the result loca- 
tion, the current state indicator 148 for that location 
is used to choose^ the- next'- state 
The current state inciicatdr is then' updated^to: re^ 
. fleet the value; of the next state.'^-;J|f' ' ^ 
26 in' adciitiorf^ 

made'^S^td^tfetfier^ 
INQUIRY 228^^"l;^tocati6n 
to be written, 'a'yead^rrite^drt^is^^ 
location is initially reSd lb ^beterri^ 
30 ' Stored there 'befdre^'Bata' is written 4o the ilocation.) 
Should chosen datum 132 be empty (as indicated 
by state indicator 148 before it is updated to the 
next state), then the data is written to that location, 
STEP 230 ' "Write Data " On the bther hand; if the^^^ 
35 location is not empty.'% dete'rm 

to whether there is one or more compressed con- 
tinuation descriptors stored within the locatiori and, : 
therefore, the location is in a waiting state (again, 
as indicated by state indicator 148). STEP 231 "Is 
40 " Location' in'Waiting State?" If the locationriis not in 
the waiting^' statef'^theh ' the ^data' isT^writteriT J^STEP 
234 ''Writes Data." If. however, that location is in a 
waiting state, then any and all compressed continu- 
ation descriptors stored in that location are re- 
45 moved and enqueued onto the local continuation 
queue associated with the running process before 
the data is written, STEP 232 "Awaken Com- 
pressed Continuation Descriptors." Subsequent to 
removing the compressed continuation descriptors, 
50 the data is written to the indicated location, STEP 
234 "Write Data," 

In one specific embodiment, each next state 
resident within state transition tables 160. 162, 164 
has an associated 3-bit control flag 168 which is 
55 retrieved when the possible next states are re- 
trieved. When one of the next states is selected, as 
described above, the associated control flag 168 is 
also selected and is used to indicate what action is 



13 



25 



EP 0 565 849 A2 



26 



to be taken by the processing .element. Jhat is. the 
control flag indicates, lor example, -whether thread 
execution is to be continued, whether execution of 
the thread is to be deferred or whether a deferred 
thread is to be awakened. Each of these actions is 
performed in the manner described above. 

Although preferred embodiments have been 
depicted and described in detail herein, it will be 
apparent to those skilled in the relevant art that 
various modifications, additions, substitutions and 
the like can be made without departing from the 
spirit of the invention and these are- therefore con- ^ 
sidered to be within the scope of the invention as 
defined in the following clainns. 

Claims V 

1. A method for synchronizing execution by a 
processing element of threads within a pro- 
cess, said method comprising the steps of: 
executing a thread within a process; ^ 
fetching during said thread execution from a 
local frame cache a datum field;, , i vv - 
fetching from a state, bit cache a state indica-., 
tor, sard state. jindicatpr. Joeing assqciated with 
said,datum7field;iandJiayirYg|a firstjtate.ycni^^^ 
determiningv based. rpn (Said^flrst t state value v 
whether saidildatumJqfield.v^^9S'y<^®s a datum ^ 
available for use by,^said jthread; sind , - 
deferring execution of said thread when said 
datum is unavailable;, t o;:i '?-vr, . ' ^ 

2. The method of claim 1 . further including the , 
•step of determining ,a secoO^ state for said 
state . indicator., saidv. second^ state replacing 
said first state during said thread execution. 
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3. The method of claim .2. wherein said second 
state determining step Includes the steps of: 
selecting Jrom said .thread a state function to 40 
be used jn deternriiningjsaid second state; 

using said state function to select from one of 
a plurality of tables N possible second states 
for said indicator; and . . ^ ... 
using said first state to choose from said se- 45 
lected N possible states said second state for 
said indicator. 

4. The method of claim 3, wherein said first state 
represents a current state of said datum and so 
said second state represents a next state of 

said datum. 

5. The method of one of claims 1 to 4. wherein 

said thread is represented by a continuation 55 
descriptor and said deferring step includes the 
step of storing said continuation descriptor 
within said datum field. 



6- The method of claim 5. wherein said continu- 
ation descriptor is cornpressed before >being 
stored in said datum field and said datum field 
can receive a plurality of said compressed 
continuation descriptors. 

7. The method of claim 6. further including the 
step of awakening said deferred thread when 
datum is available for said thread. 

ia. The method of claim 7, wherein said awaken- 
ing step includes the step of removing said 
compressed continuation descnptors from said ' 
datum field. " 

9. The method of claim 8, wherein said removed 
continuation descriptors are stored on a queue 
local to said processing element. 

10. The method of claim 8 or 9. wherein- said 
awakening step further includes the step of 
storing said available datum in said datum field 
when^ said compressed continuation ' descrip- 
tors are removed. ' * Iw V 

' 1 1. A sy^m^fpr^^lsyrichrbnizing^^^^e^^ 
processi^fg "^eiS^^ 

cess! said system comprising: ^S^-', 
]!a' Jpcal ^ frame cache, said %cal frame cache 
'including a datiim field; ^ ' '^"^ - - i 
a state bit cache, said state bit cache including 
a state indicator corresponding to said datum 
'fleld;^^'" ■■ ' ' ■ ■; - ■ 

means for executing by said processing ele- 
^ment'a'thnead within a process;"^ '^ --^^^'^ jJ^ " ^ 
^ means for fetching from said local frame cache 
said datum field and from said state bit cache 
said stateMndicator having a first state value: ' 
means for determining based on said first state 
value whether said datum field includes a da- 
^ tum available for Use by said thread; and- 
means for deferring execution ^6f said thread 
when said datum is unavailable. 

12. The system of claim 11, further comprising 
means for determining a second state for said 
state indicator, said second state replacing 
said first state during said thread execution. 

13. The system of claim 12. wherein said second 
state determining means comprises: 

means for selecting from said thread a state 
function to be used in determining said second 
state; 

a plurality of tables each having N possible 
second states for said indicator; 
means for using said state function to select 
from one of said plurality of tables said N 



14 



BNSOOCIO:<EP 0565e49A2> 



27 



EP 0 565 849 A2 



28 



possible second states; and 1/ i , ^ , 
means for using said first state to choose from 
said selected N possible states said second 
state for said indicator, 

5 

14, The system of claim 13, further comprising a 
main storage, said main storage comprising a 
copy of said datum field and a copy of said 
state indicatorrsaid copy of "^said "datum field - • 
being copied from main storage to said local io 
frame cachjB and said copy of said^ state ..i^^^^^^^^ ^ 
dicator being copied from said main stbrage to' 
said state bit cache. 



15. A method for synchronizing execution by a 
processing element of threads within a pro- 
cess, each of said threads including a plurality 
of instructions, said method comprising the 
steps of: ^ ' ; 
executing an instruction of said thread; 
fetching dOring^saicJ Jinstruction execution from 
a local fj;ameV cache atj least one source 
operand ^nd from a state bit cache at le^t 
one state indicator having a first state valujS. 
>r said at least one state indicator corresponding 
" ' to said at least one source operand; S r , v - l 
fetching from said instruction at least one state 
function associated with said at least one 
fetched source operand; - 
using said at least one state function to select 
from one of a plurality of tables N possible 
second states for said at least, one state indjca- , 
tor, each of said second states having a cor- 
responding flag indicator; : I / 
using said first state to choose from said%e- 
lected N possible states a second state for 
said state indicator; - 
replacing said first state with said second state 
during said thread execution; and ^ - | 

having said thread perform one ot a pljurality pf; 
actions after said instruction execution/ said 
action being specified by said flag indicator 
associated with said chosen.second state. 

16. The method of claim 15. wherein said plurality 
of actions includes the following actions: con- 
tinuing execution of said thread, deferring ex- 
ecution of said thread and awakening a de- 
ferred thread. 

17. The method of claim 16» wherein said thread is 
represented by a continuation descriptor and 
said deferring execution action includes the 
step of storing said continuation descriptor 
within a source operand. 

18. The method of claim 17, wherein said continu- 
ation descriptor is compressed before being 
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stored and said source operand can receive a 
plurality of compressed continuation descrip- 
tors. 

19. The method of claim 18, wherein said awaken- 
ing action includes the step of removing a 
compressed continuation descriptor from a 
source operand. 

i '*'•' ' 

20. The method of claim 19, wherein said awaken- 
ing action further includes the step of storing a . ■ 

^ source ^ op'er^dTwfien said connpressed con- 
tinuation descriptor is removed. ^ 

21. The method of claim 20, wherein said removed 
continuation descriptor is stored on a queue 
local to said processing elennent.' _ 

22. A system for -"synchronizing * execution by a 
processing element of threads within a pro- 
cess, each'of said breads, including 
of instriietibnSi said system;.co!rT»prising^^,^^^^^ 
means for executing an instruction of said 

^:;;^r% -thread;- v.^^,;vv^:-^':^^ 
' f"^ a"^iocal}frame cacffe^said| kDc:ar fr^^ 
y :^ including a plurality of source,Qperands; g 
a state bit cache/said state bit ca^^ 
a plurality of state iiridicators. each of said state 
indicators having a :first state^ "^^ ^ 
30 wherein one Of said state indicators corre- 

sponds to one of said source operands; " 
means for fetching during said instruction ex- 
^ ecution from'^said^; local frame cache at least 
one source operand and from said state - bit 
35 cache at least one corresponding state indica- 

tor; ' ; 

means \ for fetching from said instruction at 
least one state function associated with said at 
• least one source operand; 
40 Ja .plurality bf tables each having N possible 
vi.^ . .. J second States for each of said state indicators; 

means for using said at least one state function 
■ to select from one of said plurality of tables 
said N possible second states, each of said 
45 second states having an associated flag indica- 

tor; 

means for using said first state to choose from 
said selected N possible states a second state 
for said state indicator; 
50 means for replacing said first state with said 

second state; and 

means for having said thread perform one of a 
plurality of actions, said action being specified 
by said flag indicator associated with said cho- 
55 sen second state. 



15 



EP 0 565 849 A2 



] 6 



JO 



PE 
0 




PE 
I 






A 




PE 




I /DP 




I /DP. 


1023 




0 


• • « 


1023 



^INTERCONNECT I DN N^J^pRK;;:: ; 




MAIN 
MEMORY 
CONTRDL 
UNIT □ 




MAIN 
MEMORY- 
CONTROL 

UNlT'n^ 




' MAIN 
MEMORY 
CONTRDL 

...^UNILnn 

• 2047'^"-' 



I 



20 



I 

I 

1 
I 
I 



..GOttE" 
FRAME 




"LOCAL 
FRAME 


< > 


23 --"^ 


WORK 
FRAME 



MAIN MEMORY 
CONTRDL UNIT 




2 



BNSOOCID: <EP 05e5e49A2> 



16 



EP 0 565 849 A2 




17 



EP 0 565 849 A2 



0 

23 



54 



CCD 


CCD 


CCD 


CCD 


CCD 


CCD 


CCD 


CCD 





63 



CCD 



CCD 



CCD 



52 

56 58 




-'5;6 



60 



- 58 ..J ^'i 



3 6 









'CD 


"FRAME 






PDINTER 





22 



ST 



ST 



255 



ST 



INSTRUCT ION 



INSTRUCTIDN 



INSTRUCTIDN 





62 



64 



4 



18 



EP 0 565 849 A2 




CO 

o 



UJ 

o 



UJ 



UJU 
Q-CL 



<:u- 

I— UJ 
to CL 

Q 



I- 



LJ 



□ 



A) 



UJ 

ol 



LU 

: a 
□ 

LJ 



in 



in 



o 



o 
o 



00 



tl 



CO 

1 



CO 
05 



CO 



in 

CO 

in 



in ^ 



CO 



C75 



LCQ 
TAIL 




LCQ 
TAIL 


LCQ 
HEAD 




LCQ 
HEAD 


, . UJUJ UJ ' 




CODE 
FRAME 
CACHE " 
STATE 


UJ: ■ r" 
21 -J ^ -i 

, , <: iCi'-' 

i 




CODE FRAME 
CACHE . . 
PHYSICAL 
POINTER 




I - ■ 


. ' , : -'a ' CIL! 
LJ y i_ 

n <c z 

HLu.a,.A 

■ • ; CL ■ ■ :. 


LDGAL 
FRAME 
CACHE 
STATE 


i 

i / 


LOCAL 
FRAME 
CACHE 
STATE 






■LOCAL 
FRAME CACHE- 
PHYSICAL 
POINTER 




/ 

T 


LDCAL 
FRAME 
POINTER 




LOCAL 
FRAME 
POINTER 


RQ 
STATE 




RQ 
STATE 



cn 




LO 



19 



EP 0 565 849 A2 



114 



READY 
QUEUE 



8 4 



LOCAL 
CDNTINUATIDN 
QUEUE 



88 



CODE 
FRAME 
CACHE 



5 



86 



9 ] 



LOCAL 
FRAME 
CACHE 



92 



BIT ! 
CACHE 



EXECUTION 
UNIT 



PROCESSING - ELEMENT 



fig- 5 



3 r j- 



AOS 



-106 



104 



86 



fig- 7 



CODE OFFSET 


INDEX 


CODE OFFSET 


INDEX 


« 
• 


« 
• 


CODE OFFSET 


INDEX 



BNSOpCtD: <£P 0565849A2> 



20 



EP 0 565 849 A2 



84 



8 



r - 

; I - 
I 

-^1 - 



' ],00-::>. READY 



I 



S3 





CDDE 
FRAME 
POINTER 







QUEUE 



CODE FRAME 
CACHE 
PHYSICAL PTR 



CODE 




FRAME 




CACHE 




STATE 





•I 01 



3i 



CODE FRAME 
CACHE DIRECTORY 



I ] 0 



86~ 



LDCAL CGNT :.v QUEUE ■ I 



CV.. 




□ P 

CODE 



TC 



DESTINATION 
SPECIFIER 



SOURCE 
OPERAND 0 
SPECIFIER 



SOURCE 
OPERAND I 
SPECIFIER 



S9— 



CODE FRAME CACHE 



01 



EP 0 565 849 A2 



CO 



CO 



UJ 

I— 
<c 
I— 
(/) 

ID 

<: 



UJ 

<: 



^ a 



bJ 

<: 
\- 

ID 
<Z 



UJ 

<: 

La 

<: 



u 
I— 

<c 

'id 
<c 
I— \ 

/UJ 

<: 

<: 



4^ 



4S- 



4 « « 



45- 



,'LiJ' 
<C 

La 

<: 



49- 



UJ 

I— 
<: 
h- 

la 

<: 



u 

<c 
I— 

ID 

<: 
J— 

In" 



BNSOOaO: <EP OS65e4»A2> 



22 



EP 0 565 849 A2 




EP 0 565 849 A2 




BNSOOCID:<£P 0565849A2> 



24 



EP 0 565 849 A2 



4^ 



LlJ 
>- 
CD 



u 

CD 

<: 



00 

cn 




w J ^ 



X 



<: 



UJ 

I— 



U3 

<: 



<: 
I— 



•1- -SBi 



\ ■ o 



LlJ 

<: 



ID 

<: 



4Sr 



C ■ ... .J 



h- ! 
<: 

to 



C3 

<: 



LD 

<: 



UJ 

<: 
I— 

CO 



<: 



UJ 

<: 
to 



U3 

<: 



CY3 



9^ 



EP 0 565 849 A2 



DISPATCH PROCESS 



184 

'PRQCESSXY 
.ENQUEUED 
FIFO 
7 



DETERMINE VHERE TD PLACE 



ENQUEUE. 
□NTD ICQ 




UPDATE ICQT 
AND ICQH 



ENQUEUE 
□NTD 
READY 'QUEUE 



"■RQ-STATE-v> 
IS UPDATED 




PLACE 
□N 


THREAD 
LCQ 







INCOMING j^jlD CE^S 
138 



194 



PREFETCH 
RESOURCES 




UPDATE 
RQ. STATE 





UPDATE 
RQ STATE 






> 






SELECT PROCESS 
AND THREAD 



202 



i 



UPDATE 
RQ STAT 



204 



206 




FETCH 
INSTRUCTION 



2J0 



208 



ADJUST - 
LCQ HEAD 




SELECT DATA 
AND STATE BITS 



TO FIG . I 2b 



BNSOOCtO: <EP 0565049 A2> 



26 



EP 0 565 849 A2 



FROM FIG. 12q 



1 



J 25 



SELECT POSSIBLE 
NEXT STATES 



2J2 



I 



DETERMINE 
NEXT STATE 



218 




UPDATE 
CURRENT STATE 



2H 



2J6 



DETERMINE ACTION TO BE TAKEN 
220 



IS 

DATA XY 

Available, 

7 

'n 



CONTINUE 
EXECUTION 



222 



DEFER 


EXECUTION 


OF 


THREAD 



224 



23 1 



IS 

'LOCATION' 
IN WAITING 
STATE 
7 

Y 



N 



AWAKEN 
COMPRESSED 
CONTINUATION 
DESCRIPTORS 



232 
I WRITE data" 




WRITE 
DATA 



234 



228 



DATA IS TO 
BE WRITTEN 




226 
230 



Jig. 13 



27 



® 



Europ^isches Patentamt 
European Patent Office 
Office europden des brevets 



1 




® 



© Publication number: 0 565 849 A3 

EUROPEAN PATENT APPLICATION 



@ Application number: 93103580.2 i 
fi) Date of filing: 05,0a93 , T - 



© Int. ci.5i G06F .9/44, G06F 9/46 



1o 1 -'^-JtO 



® Priority: 15.07.92 US 914686 
14.04.92 US 868410 

@ Date of publication of application: 
20.10.93 Bulletin 93/42 

® Designated Contracting States: ! 
DEFRGB I" 

i ! 

@ Date of deferred publication of the search report: 



10.11.93 Bulletin 93/45 



■4 > , v 



© Applicant: International Business Machines 
Corporation 

Old Orctiard Road , . 
Armonk, kv. 10504(US) : 

@ Inventor: Gregor, Steven Lee - ; < ■ ; ^ 
628 Church Street 
Endicott, New York 1 3760(US) 
Inventor: lannucci, Robert Alan 
400-F.Brookslde Drive ■ ■ . ^ ^y^:^ ^ 
Andover; MassachusseW 01810(US) 

imMm^'S'iM' ^^^^^^ 

© Representative: Schafer, Wolf gang^^ 

J - IBM Deutsclilandlnformatipnssysl^ - 

"Pateri&vesen^ur^ A 
D-70548 Stuttgart (DE) 3^ • ^ vl' 4 i 



© A method and system for synchronizing threads within a process- 



CO 

< 

00 

IT) 
CO 
LO 



© A method and system is described for synchro- 
nizing execution by a processing element of threads 
within a process. Before execution of a thread com- 
mences, a determination is made as to whether all of 
the required resources for execution of the : thread 
are available in a cache local to the processing 
element. If the resources are not available, then the 
resources are fetched from main storage and stored 
in one or more local caches before execution begins. 
If the resources are available, then execution of the 
thread may begin. During execution of the thread 
and, in particular, an instruction within the thread, the 
instruction may require data in order to successfully 
complete its execution. When this occurs, a deter- 
mination is made as to whether the necessary data 
is available. If the data is available, the result of the 
instruction execution is stored and execution of the 
thread continues. However, if the data is unavailable, 
then the thread is deferred until the data becomes 
available and a new thread is processed. When 
deferring a thread, the thread is placed in the mem- 
ory location which is to receive the required data. 
Once the data is available, the thread is removed 
from the data location and placed on a queue for 



execution and the data is stored in the location. 



10 




16- 



r/DF 

1923 



13 



r.NTESCDNNECTICN NETVQRK 



It Li 



TT 



MAIN 
MEMORY 
CQMTRCL 
UNIT □ 



MA IN 
MEMORY 
CONTROL 
UN IT 1 



MAIN 
MEMORY 
CGNTRDL 
UNIT 
2G47 



Rank Xerox (UK) Business Services 
(3. 10/3.6/3.3. n 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Applimtoo Number 

EP 93 10 3580 
Page 1 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with indication, where appropriate, 
of relevant passages ^ 



3 i 



o : 

u. , 
O i 



4IEEE JRANSACTIONS ON COMPUTERS 
I vol. 38, no. 12. Oecember 1989, NEW YORK^ 
US 

pages 1631 - 1644 

T.E.ANDERSON 'The Performance Implications 
of Thread Management Alternatives for 
Shared-Memory Multiprocessors' 

* page 1631, left column, line 1 - page 
1632, right column , /I ine^ 35 * 

EP-A-0 381 655 (IBM) y^c . ^ 

* abstract; claim-1 v^^, 

* page 4, line 1. - Tine 31 * 

US-A-3 573 736 (H . P . SCHLAEPPI) 

'^^^abstract>^ ',i©iS.^^0g-:c*vJ^^:ns3eK)'^H 

"c6^umn-2;J':^il ine '9 iToicolumn 3^aline.2 * 

SUPERCOMPUTING .88i^PR0CEEDIHG^^^^^^^ November 
1988, ORLANDO, ;,F1_A .,^;.: ^Z^J^ \ 
pages 360 - 367 

H.DIETZ ET AL. *CRegs :-A New Kind of 
Memory for Referencing Arrays and 
Pointers' r , . , 

* figure 2 * 



1,11,15. 

22 



Relevant 
to daim 



1.11.15, 
22 



1.11.15, 
22 



1,11.15, 
22 



The present search report has been drawn up for all daims 



a-ASSinCATION OF TIIK 
APPUCATIO^ OM. a.5 ) 



G06F9/44 ^ * 
G06F9/46 - 



TEOINICALHELOS 
SEARCHED Ont. Q.S ) 



G06F 



P1»ce •! v«ircli 

THE HAGUE 



DXc of coir^cllOM •( Its icvU 

09 SEPTEMBER 1993 



SCHARFENBERGER B. 



<;ai>:g()RY of <i i>:i) i)ot i.mkm s 

X : piriicuiarly relevant if taken alone 

Y : particularly relevant if combineil i*ith another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : iniermediate document 



T : theory or principle underiying the invention 
E : earlier patent document, but published on, or 

after the filtnK date 
U : document df<d in the ippiicition 
L : docunient dted for other reasons 

& : member of the same patent family, correspondinR 
document 



BNSDOC(0:<6P 0565849 A3> 



Kuropcan Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 93 10 3580 
Page 2 



norilM ENTS CONSIDERE D TO BE RELEVANT 



CitiiUon of document with indicstion. where .ppropriate, 
of rctevant paotagcs 



Relevant 

to 4 



P A ACM SIGPLAN NOTICES 
• vol. 27. no. 7. July 1992. NEW YORK 

, pages 55 - 67 . . , 

SJAGANNATHAN ET AL. •A Customizable 
i Substrate for Concurrent Languages ^ 
'* page 56, left column, line 55 - page 58, 
left column, line 5 * • 

* page 61, left column, line 37 - right 
column, 1 ine 34 * ^ 

* page 65, right column, line 1 - page 66. 
left column, line 37 * 



1.11,15. 

22 



CLASSIFICATION OF TIIE 
APPUCATION (Int. 0.5 > 



TtCllNICAL REIJJS 
SEARCHED ant. CX.S ) 



The present sciirch report hw been drawn up for all daims 



Pitct «< March 

THE HAGUE 



09 SEPTEMBER 1993 



SCHARFENBERGER B. 



C Al fcCORY OF t il KO UOU.MK.MS 

X : particuUrly relevant if ^ 
Y : particuUrty relevant if combined wuh another 

document of the sarae cate^jory 
A : technolojticil backRround 
O : fion-witlen disclosure 
P : intermediate document 



T * theory or principle underlying the invention 
K : earlier patent document, but published on, or 

after the fiiinji date 
I) : document dted in the application 
L : document dted for other reasons 

A"i"mira"bCT irf the sinie patent '^ily, corresponding 
document 



